Introduction to Computational Social Sciences

Malo Jan & Luis Sattelmayer

2025-01-02

Introduction to CSS

  • Introduce to the field of CSS
  • Mostly about text analysis
  • Only the basics to explore more
  • Require you know about programming
  • Goal to provide less technical introduction possible, focus on use cases

What are CSS

  • Use of computing powers to analyze social phenomena
  • Quite recent and interdisciplinary
  • Not a “field” but rather a set of tools and methods
  • But specific journals have developed, networks (SICSS) and now positions

Evolution over last decades

  • More data
    • Data on the web
    • Date from the web
  • Computing power
    • More powerful computers and hardware
    • More powerful algorithms, models, IA

New opportunities

  • New data sources
  • Unstructured data : not produced by researcher
  • Collecting population rather than sample

Developments in political science

  • Early development of computational text analysis in the 2000’s in political science : convert text to numbers to perform statistical analysis

  • But really booming over the last few years with advances in AI allowing to perform more complex analysis

  • Even more recent developemnts :

    • Muiltilingual text analysis
    • Image, Video as data, multimodal analysis
    • Generative language
  • Using gilardi and Wuest 2018 different steps ?

  • Transformation of text corpora into numbers to perform statistical analysis

  • For this we need to use text-as-data, meaning, we have to transform our documents with a numerical representation : this is what text as data means : featurisation : want to represent a collection of documents to a numerical form

  • To learn from the data the function linking the text and the label, the machine needs to have numbers and not text so we need to transform our text in **a numerical representation

  • Learn feature representation of text : based on Natural Language Processing techniques, whole scientific field design to use computers to understand text

Social group detection in party manifestos

Licht and Sczepanksi (2024)

The meaning of “class” in books

CSS pipeline

Corpus selection

What is a corpus ?

  • Every project involving text analysis starts with a corpus
  • Choose documents to analyze
  • A corpus is a collection of texts
  • Crucial task and often overlooked
  • Corresponds to identifying the population of interest

Corpus selection

  • Start a research interest about a social phenomena
  • What would be a corpus that would help me answer this question ?
  • Eg. twitter data is fun but not useful for everything
  • Corpus and metadata

  • How exhaustive is the corpus ? Is it only a sample or not
  • Think about data generation process : how these texts were produced
  • Important to explore, read some of the texts
  • Think about quantity of interest : what could you derive as a measure from those texts : eg. How much a given topic is discussed in these texts

Finding or creating a corpus

  • Existing corpora vs creating original corpora
  • Existing corpora :
    • Texs datasets already available in such format : eg. Manifesto corpus, UNGA corpus, corpus that other researchers have collected, Open data
  • Collecting texts
    • API
    • Web scraping

API

  • Apycalipse : Facebook, Instagram and Twitter Researcher access have closed (see minet for workaround)
  • Bluesky
Licht, Hauke, and Ronja Sczepanksi. 2024. “Who Are They Talking about? Detecting Mentions of Social Groups in Political Texts with Supervised Learning.” ECONtribute Discussion Paper.